Add properties in Evaluation Result - Custom Evaluator extra fields. #46077
Add properties in Evaluation Result - Custom Evaluator extra fields. #46077
Conversation
There was a problem hiding this comment.
Pull request overview
Adds support for passing through custom evaluator “extra fields” via a properties bag into the AOAI-style evaluation result objects produced by the evaluation results converter.
Changes:
- Update
_extract_metric_valuesto detect anoutputs.<criteria>.propertiesdict and propagate it onto per-metric extracted values. - Update
_create_result_objectto includepropertiesin the final AOAI result payload when present. - Add a unit test asserting
propertiesis preserved and not flattened into the top-level result object.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluate/_evaluate.py | Propagates a per-criteria properties dict into per-metric result objects during AOAI conversion. |
| sdk/evaluation/azure-ai-evaluation/tests/unittests/test_evaluate.py | Adds coverage validating properties passthrough behavior for a custom evaluator result row. |
sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluate/_evaluate.py
Show resolved
Hide resolved
sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluate/_evaluate.py
Show resolved
Hide resolved
…esults Pass through evaluator properties dict in AOAI evaluation results. When an evaluator returns a properties dict, it is included alongside score, label, reason, threshold, and passed in the result object. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2fd8ce2 to
c8f5958
Compare
Update _extract_metric_values and _create_result_object docstrings to document the new properties field and its expected dict type. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluate/_evaluate.py
Outdated
Show resolved
Hide resolved
Address PR review: warn users when their custom evaluator returns 'properties' as a non-dict type so they can fix the output format. Also add properties to _create_result_object example input. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…thub.com/Azure/azure-sdk-for-python into waqasjaved02/aoai-properties-passthrough
e9b6431 to
5ec9b95
Compare
Remove erroneous space in self._eval_metric. value (two occurrences) that would cause an AttributeError at runtime when building result keys for _details and _total_tokens fields. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
nagkumar91
left a comment
There was a problem hiding this comment.
Review — Properties passthrough
- Shared dict reference (bug-risk) — In
_extract_metric_values, the samepropertiesobject is assigned to every metric entry:
for metric_dict in result_per_metric.values():
metric_dict["properties"] = properties # same object referenceIf anything downstream mutates one entry's properties, all entries are affected. Consider metric_dict["properties"] = properties.copy() (or copy.deepcopy if nested dicts matter).
-
No test for the warning path — The
isinstance(metric_value, dict)guard logs a warning when properties isn't a dict, but no test covers this branch. A quick test passingproperties="not_a_dict"would confirm the warning fires and properties is omitted. -
Typo fixes in
_base_rai_svc_eval.py— Good catch onself._eval_metric. value→self._eval_metric.value. Cosmetic (Python allows whitespace after dot) but worth cleaning up.
- Use properties.copy() to avoid shared dict reference across metrics - Add test for non-dict properties logging and omission - Change properties type mismatch log level from warning to info Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
nagkumar91
left a comment
There was a problem hiding this comment.
All review items addressed — properties.copy(), non-dict test coverage, and log level adjustment. LGTM.
…46077) * feat(evaluation): support properties passthrough in AOAI evaluation results Pass through evaluator properties dict in AOAI evaluation results. When an evaluator returns a properties dict, it is included alongside score, label, reason, threshold, and passed in the result object. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * docs: update docstrings for properties passthrough per PR review Update _extract_metric_values and _create_result_object docstrings to document the new properties field and its expected dict type. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: log warning when properties is not a dict Address PR review: warn users when their custom evaluator returns 'properties' as a non-dict type so they can fix the output format. Also add properties to _create_result_object example input. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Release-1-16-4 * Fix stray space in _eval_metric.value attribute access Remove erroneous space in self._eval_metric. value (two occurrences) that would cause an AttributeError at runtime when building result keys for _details and _total_tokens fields. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Remove empty changelog sections to fix Build Analyze check Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Address PR feedback: copy properties dict and add non-dict test - Use properties.copy() to avoid shared dict reference across metrics - Add test for non-dict properties logging and omission - Change properties type mismatch log level from warning to info Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: Sydney Lister <sydneylister@microsoft.com>
Evaluation Result is OpenAI complaint. It contains score, label, reason etc.
We have a need to present more eval results to UI. So, introducing this property bag, in which we can add more details. Science team is also going to use this property bag, which can show more outputs per evaluator.